Bitmap based algorithms for mining association rules
نویسندگان
چکیده
Discovery of association rules is an important problem in Data Mining. The classical approach is to generate all itemsets that have support (i.e., the fraction of transactions containing the itemset) above a user given threshold. Most existing algorithms aim at reducing the number of scans over the transaction database, i.e., the I/O overhead. We consider the problem of how to calculate efficiently the support, i.e., we try to optimize both I/O and CPU time. A straightforward way is to maintain, for each itemset, the relevant transaction identifiers directly into a list and use a sort-merge algorithm to do the intersection of two itemsets. Instead, we propose bitmap based algorithms. The basic idea is that every couple is represented by a bit in an index bitmap, and the logical operation AND is used in place of the sort-merge algorithm. We propose two variations of the bitmap based algorithm : the naïve bitmap algorithm (N-BM) and the hierarchical bitmap algorithm (H-BM). We then compare these two novel algorithms with the classical list based algorithm. Our experimental and analytical results demonstrate that the H-BM algorithm can outperform other algorithms by a factor of an order of magnitude. Furthermore, it is less memory demanding and can efficiently exploit existing bitmap index. Finally, we sketch parallel versions of the bitmap algorithms that are very efficient for VLDBs. keywords : data mining, association rules, bitmap Résumé : La découverte de règles associatives est un problème majeur dans le domaine du Data Mining. L'objectif est de déterminer tous les ensembles d'articles dont le support (c.à.d., la fraction de transactions qui contiennent cet ensemble) est supérieur à un seuil fixé. Alors que la plupart des algorithmes se concentrent sur la réduction du nombre de balayages de la base de données (c.à.d., le coût en E/S), nous proposons une méthode de calcul des supports qui optimise le coût CPU, sans générer d'E/S supplémentaire. Intuitivement, l'idée repose sur la construction d'un index binaire (bitmap) dans lequel chaque couple est representé par un bit. Le support d'un ensemble d'articles (A1, A2, ..., Ak) se calcule en appliquant l'opérateur logique AND à l'ensemble des colonnes concernées dans l'index. Nous proposons deux variantes de l'algorithme bitmap : l'algorithme naïf (N-BM) et l'algorithme hiérarchique (H-BM) mieux adapté à des index binaires creux. Une évaluation analytique et des résultats expérimentaux montrent que H-BM peut apporter un gain de performance d'un ordre de magnitude par rapport à N-BM ainsi que par rapport aux algorithmes traditionnels d'intersection de listes d'identifiants de transactions. De plus, nous montrons que H-BM est peu consommateur de mémoire centrale et peut facilement être parallélisé. mots-clés : data mining, règles associatives, bitmap
منابع مشابه
Introducing an algorithm for use to hide sensitive association rules through perturb technique
Due to the rapid growth of data mining technology, obtaining private data on users through this technology becomes easier. Association Rules Mining is one of the data mining techniques to extract useful patterns in the form of association rules. One of the main problems in applying this technique on databases is the disclosure of sensitive data by endangering security and privacy. Hiding the as...
متن کاملGroup Bitmap Index: A Structure for Association Rules Retrieval
Discovery of association rules from large databases of item sets is an important data mining problem. Association rules are usually stored in relational databases for future use in decision support systems. In this paper, the problem of association rules retrieval and item sets retrieval is recognized as the subset search problem in relational databases. The subset search is not well supported ...
متن کاملA fast association rule algorithm based on bitmap and granular computing
Mining association rules from databases is a time-consuming process. Finding the large item set fast is the crucial step in the association rule algorithm. In this paper we present a fast association rule algorithm (Bit-AssoRule) based on granular computing. Our Bit-AssocRule doesn’t follow the generation-and-test strategy of Apriori algorithm and adopts the divide-and-conquer strategy, thus av...
متن کاملClass Association Rules Mining based Rough Set Method
This paper investigates the mining of class association rules with the rough set approach. In data mining, an association occurs between the two sets of elements when one element set happen together with another. A class association rule set (CARs) is a subset of association rules with classes specified as their consequences. We present an efficient algorithm for mining the finest class rule se...
متن کاملAn Efficiently Algorithm for Mining Association Rules
—Association rules mining is one of the most important topic in data mining. A new algorithm for mining association rules is proposed in this paper. In data mining, the process of counting any itemset`s support requires a great I/O and computing cost. An impacted bitmap technique to speed up the counting process is employed in this paper. Nevertheless, saving the intact bitmap usually has a big...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998